Choropleth mapping

0.1 Introduction

So far, we have largely focused on the non-geographic data set, mtcars. As someone engaged with a geographic discipline, it is highly likely that you will be engaging geographical data frames. Such data sets have the advantage of being able to produce cartographic representations - maps!

Principally, R is a statistical script-based programming language without mapping capabilities. To produce maps in R you’ll need to read in a package called sf. sf is an abbreviation of Simple Features (Pebesma and Bivand 2023) that handles a data-frame objects with simplified code-lines including geographic formats.

Maps are initially complex to make in R but offer another way of describing data with geographic properties.

0.2 Shapefile basics

Geographic data come in variety of formats but essentially categories as either being vector-based or rasterised. For now, we’ll focus on manipulating a vector data frame containing polygons. First, to load sf if you’ll need to install package and then load its libraries into your R session. A way doing this with minimal as is as follows:

Code
if(!require(sf)) install.packages("sf")
Loading required package: sf
Linking to GEOS 3.11.2, GDAL 3.6.2, PROJ 9.2.0; sf_use_s2() is TRUE
Tip

When running a line of code similar that above, in a script, you may need to run it twice for the package to install and then to load. To avoid this, you can also use the following approaches:

Code
install.packages("sf")
require(sf)

If you have already have the package installed you may want to comment-out the first line by placing a hashtag before it:

Code
#install.packages("sf")
require(sf)

An example of geographic data can Tenure types recorded in Bristol, at the Lower Super Output Area (LSOA) level: https://raw.githubusercontent.com/Richtea84/I2Q-files/main/Bristol_Tenure.csv

0.2.1 Loading the CSV into R

It is possible to load the CSV into R as an object using the weblink above in using the read.csv(...) function. Let’s call the Object ‘Bristol_data’.

Code
Bristol_data <- read.csv(file = "https://raw.githubusercontent.com/Richtea84/I2Q-files/main/Bristol_Tenure.csv")

You can preview the results using the head(...) function. For example:

Code
head(x = Bristol_data, n = 5)
  X2021.super.output.area...lower.layer Total..All.households Owned
1              E01014601 : Bristol 001A                   746   474
2              E01014602 : Bristol 001B                   827   353
3              E01014603 : Bristol 001C                  1068   604
4              E01014605 : Bristol 001E                   835   237
5              E01032516 : Bristol 001G                   859   578
  Shared.ownership Social.rented Private.rented Lives.rent.free
1                5           187             77               3
2                6           373             95               0
3               27           195            242               0
4                1           473            120               4
5                7           139            132               3
  Owns.with.a.mortgage.or.loan.or.shared.ownership
1                                              247
2                                              196
3                                              371
4                                              137
5                                              252
  Private.rented.or.lives.rent.free
1                                80
2                                95
3                               242
4                               124
5                               135

As discussed in previous sections, there are ways in which you can change the column names - for now, we’ll leave them as they are. The data we have, is a ‘data frame’.

0.2.2 Loading Mappable data (shapefile)

As we are looking at 2021 LSOAs for Bristol, we can access the corresponding geographic boundaries from UK data service: https://borders.ukdataservice.ac.uk/

Once there do the following:

  • Select the ‘Boundary Data Selector’ tool
  • In the first ‘select’ Box choose england, and in the third, select ‘2011 and later’. Click on the ‘Find’ button.
  • In the ‘Boundaries’ box select ‘English Lower Layer Super Output Areas, 2021’.
  • List the the Areas. As of writing this page, only ‘England’ is available - this brings in the whole of the UK’s Lower Super Output Areas.
  • Ensure that the data format is set to SHAPE with an archive format of Zip and click on ‘Extract Boundary Data’.

UK Data Service Census Support’s Boundary Data Selector with relevant settings.

You may encounter further prompts related to the selection of the data download format - ensure you pick the ‘Download features in Shapefile format as Zip file’ for your selected data set when prompted.

At this point, it is important that a relevant working directory has been set up. The directory will contain both you CSV and the zipped data that you’ve downloaded. Next, boundary data is unzipped and subsequently loaded into R as a new object (called Bristol_LSOAs) using the read_sf(...) function from the sf package:

Code
Bristol_LSOAs <- read_sf(dsn = "England_lsoa_2021/england_lsoa_2021.shp")
Note

For the shapefile (.shp) to load correctly, ensure that all of the other dependencies associated with the shapefile are present. Here, these are:

  • dbf (data base)
  • prj (projection)
  • shx (geometries)

Under some circumstances, other dependencies may be present such as:

  • cpg (code page for character sets)
  • sbn (spatial indexing essential for processing the final output)
  • sbx (furhter spatial indexing information used in rendering the final output)

Whatever comes across in the download, ensure that it is present in the folder from which you are reading the .shp file.

0.2.3 Joining the CSV data to the Mappable data

The process of joining CSV data to the mappable data uses the merge(...) function in exactly the same way as before. An important detail is that the mappable data is always the first object to be called in.

First, let’s change the name of the field containing the containing the LSOA code reference (beginning ‘E01’).

Code
# Changing the name of the 'code field'
names(Bristol_data)[1] = "code"

The ‘code’ field we’ve just created contains too much information, making it impossible to match to Bristol_LSOAs data. The LSOA code is found in the first 9 characters. To shorten the contents here, we can use the substr(...) function:

  • The field we want to shorten forms the front of our argument
  • x = is the same field
  • start = specifies the start of the crop - here, it is the first character
  • stop = specifies the end of the crop - here, it is the ninth character
Code
# Shortening the field to the first 9 letters
Bristol_data$code <- substr(x = Bristol_data$code, start = 1, stop = 9)

With the fields properly prepared, we are able to conduct the join using the merge(...) function. Again, note that the geographic data comes first (x=).

Code
# Performing the merge
Bristol_merged <- merge(x = Bristol_LSOAs, y = Bristol_data, by.x = "lsoa21cd", by.y = "code")

0.2.4 Previewing

With sf loaded, it is possible to preview the result of the join using the plot function. Let’s examine the first 6 plots using a slicing:

Code
plot(Bristol_merged[,c(1:6)])

Figure 1: Preview of merged shapefile data

0.3 Clipping and cropping

Boundary of the Bristol featured in the LSOA data features the Avonmouth Extension (Figure 1) that protudes out into the sea due the region’s in relation to Bristol’s maritime history that saw shipping trade attributed to the large tidal range of the River Avon that enabled Bristol’s Floating Harbour; shipping traffic falls under the jurisdiction of Bristol’s port authority within the extension. However, we are only interested with the land administration component, where we will need to clip the region accordingly.

0.3.1 Dissolving

The first step is to ‘clip’ the extension off so that we are only looking at the land. This is achieved by downloading the boundaries for Bristol’s wards. The Shapefile version of the data should be downloaded and loaded following a similar process to that described in Section 0.2.2.

Let’s load the shapefile as an object called ‘Bristol_wards’ (ensuring all of the dependencies are present):

Code
Bristol_wards <- read_sf("Wards/Wards.shp")

We can plot one of the variables using the plot(...) function:

Code
plot(Bristol_wards["NAME"])

Figure 2: Bristol Wardss layer (NAME field sliced)

This good, but let’s merge all of the separate polygons into a single landmass. The sf package features a useful function that dissolves boundaries called st_union(...):

Code
Bristol_Mass <- st_union(x = Bristol_wards)
plot(Bristol_Mass, col = "palegreen")

Figure 3: The dissolved wards layer

0.3.2 Cropping

Now that we have the dissolved (or unionised) layer, we are in a position to clip off the Avonmouth Extension. Here’s we’ll use the st_clip(...) function and use the Bristol_Mass object as the mask.

  • x = The data that needs to be cropped (Bristol_merged)
  • y = The cropping background
Code
Bristol_merge_2 <- st_crop(x = Bristol_merged, y = Bristol_Mass)
Warning: attribute variables are assumed to be spatially constant throughout
all geometries
Code
plot(Bristol_merge_2["Social.rented"])

A preview of Bristol’s social rentals (2021) excluding the Avonmouth Extension

At this point, we are examining continuous data where the blue tones indicate lower observation frequencies, while the yellow tones are the higher frequencies.

0.4 tmap cartography

Inspecting social rentals, we can detect areas where social rentals (i.e. council tenancies) are highest - in the middle and towards the bottom. However, it is difficult to ascertain scale and location, where this graphic lacks cartographic elements and cannot be considered a ‘map’ at this stage. To introduce cartographic elements another package needs to be introduced, tmap (Tennekes 2018). Loading this package follows a similar process to that described above:

Code
if(!require(tmap)) install.packages("tmap")
Loading required package: tmap

The parsing for tmap differs from the default R coding structure and closely aligned to an extended graphical suite called ggplot that features in other programming languages such as Python. Here’s the code for producing a basic preview for our clipped data with tmap is as follows:

Code
tm_shape(shp = Bristol_merge_2)+
  tm_polygons(col = "Social.rented")

Figure 4: Preview of social rents in Bristol using tmap
  • tm_shape(...) introduces the geographic (mappable data)
  • tm_polygons(...) specifies the geometry and its aesthetic properties.

These are linked together with + symbols.

To make this graphic cartographic, we need to include the following elements:

  • Compass: function is tm_compass(...)
  • Scale bar: function is tm_scale_bar(...)
  • Graticules (longitude and latitude lines): function is tm_graticules(...)

Adding these on can be achieved by using further + symbols in your code. An example is shown in Figure 5.

Code
tm_shape(shp = Bristol_merge_2)+
  tm_polygons(col = "Social.rented")+
  tm_compass()+
  tm_scale_bar()+
  tm_graticules()

Figure 5: A map (with cartographic features) produced in tmap

0.4.1 Aesthetic manipulation

As with the plot(...) function, colours and elements can be manipulated. You can learn more about how to customise the look and feel of your map by inserting ? in front of the graphical element that you want to manipulate.

An example of how arguments can be applied to improve the map above is as follows:

Code
tm_shape(shp = Bristol_merge_2)+
  tm_polygons(col = "Social.rented", title = "Social rented", border.alpha = 0)+
  tm_compass(position = c("right", "top"), color.dark = "navy",type = "4star", text.color = "navy")+
  tm_scale_bar(position = c("left", "bottom"), color.dark = "navy",text.color = "navy", breaks = c(0,2,4,6))+
  tm_graticules(col = "navy")+
  tm_layout(legend.outside = TRUE, legend.position = c("right","center"),
            bg.color = "grey", frame = FALSE)

Figure 6: Various aesthetic manipulations applied to out data

In addition to the placement, the map’s default colour scheme can also be manipulated. Using either a default Brewer palette, importing a palette library, or by creating your own customised palette. Here’s an example of a map produced using a custom colour palette (Figure 7)

Code
custom_pal <- c("red","grey20")

tm_shape(shp = Bristol_merge_2)+
  tm_polygons(col = "Social.rented", title = "Social rented", border.alpha = 0, palette = rev(custom_pal), n = 12)+
  tm_compass(position = c("right", "top"), color.dark = "navy",type = "4star", text.color = "white")+
  tm_scale_bar(position = c("left", "bottom"), color.dark = "navy",text.color = "white", breaks = c(0,2,4,6))+
  tm_graticules(col = "navy")+
  tm_layout(legend.outside = TRUE, legend.position = c("right","center"),
            bg.color = "grey", frame = FALSE)

Figure 7: Application of a custom colour palette

A critical step in the production of this graphic is the use of palette = argument that points to the custom colour palette. Within this argument the rev(...) function is applied to reverse the colour palette.

0.5 Basemaps

While someone with a good understanding of geometry can figure out the precise location of the featured regions, for most the inclusion of a base map is far more practical. This can be added using a package called rosm (Dunnington 2022) that has access to Open Street Map map tiles stored in a raster format. We’ll discuss rasters in another section.

Code
# Installing rosm
if(!require(rosm)) install.packages("rosm")
Loading required package: rosm

0.5.1 Changing map projection

As Open Street Map is projected to the Mercator projection (same as Google Maps), we need to change the projection of our map data (Bristol_merge_2) so that it conforms. The EPSG code we need is 4326 This can be achieved using the st_transform(...) from the sf package. Here is how it is applied in the creation of a new object simply called Bristol_reproj:

Code
Bristol_reproj <- st_transform(x = Bristol_merge_2, crs = 4326)

With the application of the Mercator projection assigned to our data, we can now retrieve a relevant base map. For this, we need the osm.raster(...) function from rosm and the st_bbox(...) function from sf to define the bounding box (or rectangular extents) of our data.

Code
my_basemap <- rosm::osm.raster(st_bbox(Bristol_reproj),)
Zoom: 12

To add the base map, a raster image, we use tm_shape(...) in combination with the tm_rgb(...) function from tmap. Here’s our base map:

Code
tm_shape(my_basemap)+
  tm_rgb()

Figure 8: The retreived rosm basemap

As with other elements in tmap the basemap can be manipulated. Let’s make it monochrome by lowering the saturation = argument to 0:

Code
tm_shape(my_basemap)+
  tm_rgb(saturation = 0)

Figure 9: The base map in monochrome, saturation set to 0

Adding our mapped data onto of this is this base map is achieved by adding the map code immediately after the base map. Transparency on for out polygon layer is achieved by introducing the alpha = argument, the selected value is 0.45:

Code
tm_shape(my_basemap)+
  tm_rgb(saturation = 0)+
  tm_shape(shp = Bristol_merge_2)+
  tm_polygons(col = "Social.rented", title = "Social rented", border.alpha = 0, palette = rev(custom_pal), n = 12, alpha = 0.45)+
  tm_compass(position = c("right", "top"), color.dark = "navy",type = "4star", text.color = "white")+
  tm_scale_bar(position = c("left", "bottom"), color.dark = "navy",text.color = "white", breaks = c(0,2,4,6))+
  tm_graticules(col = "navy")+
  tm_layout(legend.outside = TRUE, legend.position = c("right","center"),
            bg.color = "grey", frame = FALSE)

Figure 10: Applying the base map to our mapped data…

As seen in Figure 10, we are able to contextualise data we’ve imported by seeing the names of nearby areas. Looking at our web browswers it is possible to identify that Hartcliffe and areas close to Easton, Redfield, and St Philips are have high occurrences of social rented housing.

0.6 Interactive maps

A helpful feature in tmap is its ability to produce interactive web maps. These are essentially rendered using a javascript package known as leaftlet. Creating an interactive map in tmap is achieved by using the following command before your main code tmap_mode("view"). Here’s the result when applied to our map above:

Code
tmap_mode("view")
tmap mode set to interactive viewing
Code
#tm_shape(my_basemap)+
  #tm_rgb(saturation = 0)+
  tm_shape(shp = Bristol_merge_2)+
  tm_polygons(col = "Social.rented", title = "Social rented", border.alpha = 0, palette = rev(custom_pal), n = 12, alpha = 0.45)+
  tm_compass(position = c("right", "top"), color.dark = "navy",type = "4star", text.color = "white")+
  tm_scale_bar(position = c("left", "bottom"), color.dark = "navy",text.color = "white", breaks = c(0,2,4,6))+
  tm_graticules(col = "navy")+
  tm_layout(legend.outside = TRUE, legend.position = c("right","center"),
            bg.color = "grey", frame = FALSE)
Compass not supported in view mode.
legend.postion is used for plot mode. Use view.legend.position in tm_view to set the legend position in view mode.
Warning: In view mode, scale bar breaks are ignored.
Figure 11: An example of an interactive map

A crucial concern with this mode of mapping is the loss of cartographic elements including the graticules, and compass - this results in a number of warnings when our raw code is introduced. Since several base maps are supplied by default, we no longer have to generate a base map. To change the mode back, we use the following line of code:

Code
tmap_mode("plot")
tmap mode set to plotting

References

Dunnington, Dewey. 2022. Rosm: Plot Raster Map Tiles from Open Street Map and Other Sources. https://CRAN.R-project.org/package=rosm.
Pebesma, Edzer, and Roger Bivand. 2023. Spatial Data Science: With applications in R. Chapman and Hall/CRC. https://r-spatial.org/book/.
Tennekes, Martijn. 2018. tmap: Thematic Maps in R.” Journal of Statistical Software 84 (6): 1–39. https://doi.org/10.18637/jss.v084.i06.